The purpose of this tutorial is to get you comfortable with editing code within the RStudio interface and to familiarize you with a few functions that are helpful for visualizing data. I’m assuming you’ve already installed R and RStudio on your computer, but that you have minimal prior experience with programming in any language.
My goal isn’t for you to leave this session knowing how to do everything your would like to do. My goal is for you to leave this session with questions that you’re excited to figure out the answers to.
Before we get started, let’s take a little tour of the RStudio interface:
Start a new RMarkdown file to follow along with the rest of this session.
Most of the functionality of R comes from packages. There are over 10,000 R packages available for a variety of uses, and some are quite specialized. Today, we will be using
tidyverse: a collection of packages that’s really useful for data visualization and data cleaningsf: a set of tools for working with spatial data consistent with tidyverse methodsspData: a bunch of datasets for people to practice spatial data analysisggthemes: some nice themes you can use to quickly customize figuresggspatial: some extra tools for creating attractive and useful maps.You will need to install these packages if you haven’t already. Regardless of whether they’re already installed, you’ll need to load them using the library() function.
library(tidyverse)
library(sf)
library(spData)
library(ggthemes)
library(ggspatial)
The house dataset is included with the spData package.
I’m going to load the dataset and save it to a data frame called homes. The st_as_sf() function converts the data into the sf format.
homes <- st_as_sf(house)
Once the data is loaded, you’ll see it in your environment tab. It contains data on 25,357 single family homes sold in Lucas County, Ohio between 1993 and 1998, based on data from the county auditor (this data is taken from the James P. LeSage’s Spatial Econometrics Toolbox for Matlab. The dataset includes the following variables:
Exercise: Click on the name of the dataset in your environment tab or type View(homes) in your console to view the data a spreadsheet format.
The ggplot package is part of tidyverse, which was developed by Hadley Wickham. It offers a powerful set of tools for visualizing data, using an approach Wickham refers to as “a layered grammar of graphics,” which lets you create graphics using layers of commands.
For the first “layer,” you’ll call the ggplot() function, which sets up the the plot and indicates that we’re working with the homes dataset we’ve loaded. Then you can add a layer to represent the data using geom_point(). You’ll need to specify which variable you’ll represent on the x-axis and which variable you’ll represent on the y-axis.
We’ll make a quick scatterplot showing the year the home was built on the x-axis and the price of the home on the y-axis.
ggplot(homes) +
geom_point(aes(x = yrbuilt, y = price))
Exercise: In your own RStudio session, experiment with plotting other pairs of variables.
I can represent additional variables with color and size. For example, I might have different colors represent different types of buildings (the stories variable).
ggplot(homes) +
geom_point(aes(x = yrbuilt, y = price, color = stories))
And I might use the sizes of the points to represent each home’s square footage (the TLA variable).
ggplot(homes) +
geom_point(aes(x = yrbuilt, y = price, color = stories, size = TLA))
It’s sort of hard to see what’s going on with all those dots on top of each other, so you might want to make them all a little bit transparent using the alpha argument.
Note: Characteristics that represent variables should go inside the aes() function, and characteristics you want to apply to all variables should go outside the aes() function.
ggplot(homes) +
geom_point(aes(x = yrbuilt,
y = price,
color = stories,
size = TLA),
alpha = 0.25)
Exercise: In your own RStudio session, experiment with representing two or more of these variables in a variety of different ways.
If you want to go way beyond simple scatterplots, feel free to peruse the ggplot cheat sheet for inspiration.
You might also apply a theme to your scatterplot if you don’t like the default appearance. Here is the same scatterplot using a more minimalist theme.
ggplot(homes) +
geom_point(aes(x = yrbuilt,
y = price,
color = stories,
size = TLA),
alpha = 0.25) +
theme_bw()
And here it is using a theme inspired by plots that appear in the Wall Street Journal.
ggplot(homes) +
geom_point(aes(x = yrbuilt,
y = price,
color = stories,
size = TLA),
alpha = 0.25) +
theme_wsj()
Exercise: Experiment with applying a few different themes to your scatterplot.
Exercise: Create an attractive scatterplot using the house dataset and share it with a small group of three or four classmates.
A scatter plot is useful for showing how two or more variables relate to one another, but we may also be interested in how they vary across space. This dataset includes spatial information, so we can plot it on a map using geom_sf().
Let’s create a map showing how the price of a single-family home varies across space.
ggplot(homes) +
geom_sf(aes(color = price), alpha = 0.5)
An appropriate theme for many maps is theme_map.
ggplot(homes) +
geom_sf(aes(color = price), alpha = 0.5) +
theme_map()
You might want to orient your viewer using a basemap. The ggspatial package has a few you can choose from. Here’s the default.
ggplot(homes) +
annotation_map_tile(zoomin = 0, progress = "none") +
geom_sf(aes(color = price), alpha = 0.5) +
theme_map()
I also really like the black and white Stamen basemap.
ggplot(homes) +
annotation_map_tile(zoomin = 0, progress = "none", type = "stamenbw") +
geom_sf(aes(color = price), alpha = 0.5) +
theme_map()
annotation_map_tile() is a function from the ggspatial package that brings in base map images from Open Street Maps. You can see a list of available base maps using rosm::osm.types()
rosm::osm.types()
## [1] "osm" "opencycle" "hotstyle"
## [4] "loviniahike" "loviniacycle" "hikebike"
## [7] "hillshade" "osmgrayscale" "stamenbw"
## [10] "stamenwatercolor" "osmtransport" "thunderforestlandscape"
## [13] "thunderforestoutdoors" "cartodark" "cartolight"
You can modify the appearance of the points on a map the same way you would for a scatterplot.
Exersize: Create an interesting map based on the house dataset and share it with a group of three or four classmates.